Skip to content

fix(ai/recipes): declare max_batch_tokens on google embedding recipe#1016

Open
ChenyqThu wants to merge 1 commit into
garrytan:masterfrom
ChenyqThu:upstream-fix/google-recipe-max-batch-tokens
Open

fix(ai/recipes): declare max_batch_tokens on google embedding recipe#1016
ChenyqThu wants to merge 1 commit into
garrytan:masterfrom
ChenyqThu:upstream-fix/google-recipe-max-batch-tokens

Conversation

@ChenyqThu
Copy link
Copy Markdown

Why

google is the only first-party embedding recipe still missing max_batch_tokens after v0.32 (#779) landed the once-per-process startup warning. Operators routing through google:gemini-embedding-001 (the default-provider path after v0.27 native gateway) see the warning on every gbrain query, every kos-compat-api /ingest response, and every cron gbrain invocation:

[ai.gateway] recipe "google" declares an embedding touchpoint without max_batch_tokens; recursion is the only safety net for batch caps.

For CJK-dense or large-payload batches the absent field also forces the gateway to discover Google's per-request token cap reactively via recursive halving instead of pre-splitting — one wasted HTTP round-trip per oversized batch.

The existing test test/ai/no-batch-cap-suppression.serial.test.ts even explicitly pins this state with the comment "google should warn (it has fixed-cap models)", framing it as a known TODO.

Patch

Three field additions to google.touchpoints.embedding:

max_batch_tokens: 20_000,
chars_per_token: 2,
// safety_factor left at gateway default 0.8

Rationale

  • max_batch_tokens: 20_000 — Google's documented per-text cap is 2 048 tokens; ~20 k tokens per request is the soft cap before gemini-embedding-001 starts emitting 429s.
  • chars_per_token: 2 — English averages ~4 chars/token (OpenAI default), CJK ~1.5. Picking 2 reflects realistic density on mixed-corpora brains so pre-split stays safe on CJK-heavy payloads without over-shrinking on English.
  • safety_factor — left at gateway default 0.8. Pre-split lands at 20 000 × 0.8 / 2 = 8 000 chars/batch.

Test follow-through

Two regression tests pinned google as the canary "real provider with no cap declared":

  • test/ai/no-batch-cap-suppression.serial.test.ts: assumed google STILL warned. With google capped, the test flips to assert the stronger invariant — no first-party recipe warns, because every native/openai-compat recipe now declares either max_batch_tokens or no_batch_cap.
  • test/ai/adaptive-embed-batch.test.ts: checked contractMatch.length > 0. After this patch the canary set is empty → toBe(0). The once-per-process suppression mechanism is still exercised by the firstCallCount stability check earlier in the same test.

Both tests still gate the suppression machinery; they just no longer require any first-party recipe to be the canary. If maintainers prefer a synthetic test-only uncapped recipe to keep a positive "warning fires when expected" assertion, happy to add one in a follow-up.

Validation

  • bun run typecheck clean
  • bun test test/ai/ — 144 pass / 0 fail (was 142 pass / 2 fail pre-patch, expected: the two tests above)

Field repro

Filed downstream from the Jarvis KOS v2 fork (ChenyqThu/jarvis-knowledge-os-v2), where the google native gateway is the default embedding provider since the v0.27 M3 cutover. The warning has been load-bearing log noise for ~6 weeks across kos-compat-api responses, cron logs, and operator queries. Same pattern as #627 / upstream fixwave #682+#741 (forward-bootstrap mcp_request_log) — small fork-local patch carried while upstream evaluates.

🤖 Generated with Claude Code

google was the only first-party embedding recipe still missing
max_batch_tokens after v0.32 garrytan#779 landed the once-per-process startup
warning. Operators routing through google:gemini-embedding-001 (the
default-provider path after v0.27 native gateway) saw the warning on
every `gbrain query`, every kos-compat-api / MCP `/ingest` response,
and every cron `gbrain` invocation. For CJK-dense or large-payload
batches the absent field also forced the gateway to discover Google's
per-request token cap reactively via recursive halving instead of
pre-splitting.

Declared:
- max_batch_tokens: 20_000 — Google's per-text cap is 2048 tokens;
  ~20k tokens/request is the soft cap before gemini-embedding-001
  starts emitting 429s.
- chars_per_token: 2 — CJK density on mixed corpora (English averages
  ~4, CJK ~1.5; 2 keeps pre-split safe for both).
- safety_factor left at gateway default 0.8 → pre-split lands at
  ~8 000 chars/batch, well under any per-request floor Google
  publishes.

Two existing regression tests pinned google as the canary "real
provider with no cap declared":

- test/ai/no-batch-cap-suppression.serial.test.ts assumed google
  STILL warned (the comment explicitly called it a fixed-cap model
  waiting for someone to cap it). With this patch google joins the
  capped set, so the test flips to assert the strong invariant: NO
  first-party recipe warns, because every native and openai-compat
  recipe now declares either max_batch_tokens or no_batch_cap.

- test/ai/adaptive-embed-batch.test.ts checked
  `contractMatch.length > 0`. After this patch the canary set is
  empty, so `toBe(0)`. The once-per-process suppression mechanism is
  still exercised by the `firstCallCount` stability check earlier in
  the same test.

Validation:
- bun run typecheck clean
- bun test test/ai/ — 144 pass / 0 fail (was 142 pass / 2 fail
  pre-patch, expected: the two tests above)
ChenyqThu pushed a commit to ChenyqThu/jarvis-knowledge-os-v2 that referenced this pull request May 15, 2026
PR garrytan#1016 (upstream-fix/google-recipe-max-batch-tokens)
also touches two regression tests that pinned google as the canary
"real provider with no cap declared". Without backporting them to
fork master, `bun test test/ai/` would 2-fail on this tree even
though google.ts has been capped here since af2a806.

- no-batch-cap-suppression.serial.test.ts: replaces the "STILL warns
  for google" test with a stronger invariant — NO first-party recipe
  warns, because every native/openai-compat recipe now declares
  either max_batch_tokens or no_batch_cap.
- adaptive-embed-batch.test.ts: inverts `contractMatch.length > 0`
  to `=== 0` and adds google to the "must be absent from warnings"
  assertion list (sibling to voyage + openai).

Validation: `bun test test/ai/` → 144 pass / 0 fail.

Also: TODO.md updates link PR-1016 + PR-1017 from the related
entries; PR-2 oauth-bootstrap entry flips to FILED with the same
patch metadata.

On upstream merge of either PR, the next sync auto-drops these
local edits via clean text merge.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
ChenyqThu pushed a commit to ChenyqThu/jarvis-knowledge-os-v2 that referenced this pull request May 15, 2026
JARVIS-ARCHITECTURE.md §6.25 records the 2026-05-15 follow-up session:
diagnostic correction (the historical 117k stderr "fire" that wasn't),
two upstream PRs filed (garrytan#1016 google max_batch_tokens, garrytan#1017
oauth_clients forward-bootstrap), CJK keyword-only eval verdict,
overlap-matrix verdict, M2-B + M2-C verdicts, mechanical cleanup
(48 sync_failures ack'd, bun-test hang root-caused), and the new
[facts:absorb] sub-process DB-connection latent bug. 8 TODO items
closed, 1 new entry filed.

CLAUDE.md "Chinese-first knowledge base" rule tightened per the
CJK-eval verdict: the strict invariant is *compound CJK (4+ Han chars
without whitespace) requires vector*; English and 2-3 char standalone
CJK match fine on keyword. Operational guidance unchanged — the modal
operator query on this brain is a compound CJK phrase that depends on
vector being live.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
ChenyqThu pushed a commit to ChenyqThu/jarvis-knowledge-os-v2 that referenced this pull request May 20, 2026
v0.37.0.0 upstream sync (11 versions, 12 commits, v0.35.6.0 → v0.37.0.0).

Conflicts (4) resolved:
- CLAUDE.md: keep fork (upstream content lives in docs/CLAUDE-UPSTREAM.md;
  selected new sections will be ported in Phase 7).
- llms-full.txt: take upstream then bun run build:llms in Phase 2.
- test/ai/adaptive-embed-batch.test.ts: take upstream — v0.36.1.1 garrytan#1083
  cherry-pick superseded our PR garrytan#1016 backport tests.
- test/ai/no-batch-cap-suppression.serial.test.ts: take upstream — same.

Auto-merged cleanly: VERSION (→ 0.37.0.0), CHANGELOG.md, README.md,
package.json, bun.lock, .gitignore, skills/RESOLVER.md (fork
## KOS-Jarvis extensions section preserved at L128, upstream
skillpack-harvest entry added at L62), skills/manifest.json (47 total,
fork 4 KOS-Jarvis entries preserved), src/core/pglite-engine.ts (WAL
fork patch survived at L207 — line shift from L200 due to upstream
import additions).

Fork-protected paths (skills/kos-jarvis/, server/, workers/,
scripts/launchd/) untouched — verified via diff master..HEAD.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
ChenyqThu pushed a commit to ChenyqThu/jarvis-knowledge-os-v2 that referenced this pull request May 20, 2026
Phase 1 conflict resolution took upstream versions of these two
tests assuming v0.36.1.1 garrytan#1083 cherry-pick was the equivalent of
fork PR garrytan#1016. Wrong: v0.36.1.1 went with the "warn only for
the configured embedding provider" filter path (gateway.ts),
while fork kept the "declare max_batch_tokens on google recipe"
path (google.ts +7 lines, retained through merge).

Net effect of both paths together:
- gateway.ts warning filter narrows to configured provider (upstream)
- google.ts has max_batch_tokens declared so won't warn anyway (fork)

Fork tests expect 0 warnings; upstream tests expect 1 (google when
google configured). Took fork view back — google.ts business value
(pre-split CJK batches via max_batch_tokens=20_000 + chars_per_token=2,
reduces reactive token-cap discovery overhead) justifies retaining
the fork patch.

Verification: bun test test/ai/ → 224 pass / 0 fail / 764 expect().

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
ChenyqThu pushed a commit to ChenyqThu/jarvis-knowledge-os-v2 that referenced this pull request May 20, 2026
JARVIS-ARCHITECTURE.md §6.29:
- 11 versions overview (v0.35.6.0 → v0.37.0.0, 12 commits, 333 files)
- 4-conflict resolution log (test/ai/ reverted to fork after finding
  v0.36.1.1 garrytan#1083 went a different fix path from PR garrytan#1016)
- Schema v66 → v78 migration evidence (12 migrations auto-bootstrap)
- ZeroEntropy lock procedure (config set ze_switch_declined_at +
  ze_switch_prompt_shown=true since CLI has no --decline flag)
- Health score 70 → 80 evidence + new v0.36.x checks all PASS
- Smoke evidence (typecheck/test/curl/query Chinese+English/patrol)

CLAUDE.md: Latest sync story pointer updated to §6.29.

TODO.md:
- Top intro replaced with v0.37.0.0 sync context
- L824 PR garrytan#1016 verdict revised — NOT superseded by v0.36.1.1
- Added 5 new P1 entries (post-v0.37.0.0):
  - dream-cycle LLM spend observation (calibration phases)
  - embedding_columns registry declare (optional)
  - image ingestion roadmap decision
  - doctor --remediate vs kos-patrol evaluation pointer
  - typed-claim fence for OH transcripts
- Branch cleanup deferred to Lucien (worktree refs block delete)

llms-full.txt regenerated via bun run build:llms.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
ChenyqThu pushed a commit to ChenyqThu/jarvis-knowledge-os-v2 that referenced this pull request May 20, 2026
…12 commits, v0.35.6.0 → v0.37.0.0)

333 files / +52455 / -3317 LoC. 4 conflicts resolved (CLAUDE.md /
llms-full.txt / 2 test/ai/ files — last 2 reverted to fork view after
discovering v0.36.1.1 garrytan#1083 went a different fix path from PR garrytan#1016).

Schema v66 → v78 auto-bootstrap (12 migrations, zero manual ALTER).
WAL fork patch survived at src/core/pglite-engine.ts:207. ZeroEntropy
switch lock applied (ze_switch_declined_at + ze_switch_prompt_shown).
Doctor health_score 70 → 80; 3140 pages preserved; vector(1536) unchanged.

Smoke: bun test test/ai/ 224/224 green. kos.chenge.ink/health returns
0.37.0.0 + engine=postgres. kos-patrol kickstart exit 0, dashboard
written. Chinese + English query smoke top-hits in healthy band.

Fork dirs unchanged (10). Branch cleanup deferred (worktree refs).
PR garrytan#1016 verdict revised: NOT superseded — upstream filter path
co-exists with fork declare-max_batch_tokens path.

Full story: docs/JARVIS-ARCHITECTURE.md §6.29.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant